36 research outputs found

    Approximation Algorithms for Clustering Problems with Lower Bounds and Outliers

    Get PDF
    We consider clustering problems with non-uniform lower bounds and outliers, and obtain the first approximation guarantees for these problems. We have a set F of facilities with lower bounds {L_i}_{i in F} and a set D of clients located in a common metric space {c(i,j)}_{i,j in F union D}, and bounds k, m. A feasible solution is a pair (S subseteq F, sigma: D -> S union {out}), where sigma specifies the client assignments, such that |S| = L_i for all i in S, and |sigma^{-1}(out)| <= m. In the lower-bounded min-sum-of-radii with outliers P (LBkSRO) problem, the objective is to minimize sum_{i in S} max_{j in sigma^{-1})i)}, and in the lower-bounded k-supplier with outliers (LBkSupO) problem, the objective is to minimize max_{i in S} max_{j in sigma^{-1})i)} c(i,j). We obtain an approximation factor of 12.365 for LBkSRO, which improves to 3.83 for the non-outlier version (i.e., m = 0). These also constitute the first approximation bounds for the min-sum-of-radii objective when we consider lower bounds and outliers separately. We apply the primal-dual method to the relaxation where we Lagrangify the |S| <= k constraint. The chief technical contribution and novelty of our algorithm is that, departing from the standard paradigm used for such constrained problems, we obtain an O(1)-approximation despite the fact that we do not obtain a Lagrangian-multiplier-preserving algorithm for the Lagrangian relaxation. We believe that our ideas have broader applicability to other clustering problems with outliers as well. We obtain approximation factors of 5 and 3 respectively for LBkSupO and its non-outlier version. These are the first approximation results for k-supplier with non-uniform lower bounds

    Further Approximations for Demand Matching: Matroid Constraints and Minor-Closed Graphs

    Get PDF
    We pursue a study of the Generalized Demand Matching problem, a common generalization of the b-Matching and Knapsack problems. Here, we are given a graph with vertex capacities, edge profits, and asymmetric demands on the edges. The goal is to find a maximum-profit subset of edges so the demands of chosen edges do not violate the vertex capacities. This problem is APX-hard and constant-factor approximations are already known. Our main results fall into two categories. First, using iterated relaxation and various filtering strategies, we show with an efficient rounding algorithm that if an additional matroid structure M is given and we further only allow sets that are independent in M, the natural LP relaxation has an integrality gap of at most 25/3. This can be further improved in various special cases, for example we improve over the 15-approximation for the previously- studied Coupled Placement problem [Korupolu et al. 2014] by giving a 7-approximation. Using similar techniques, we show the problem of computing a minimum-cost base in M satisfying vertex capacities admits a (1,3)-bicriteria approximation: the cost is at most the optimum and the capacities are violated by a factor of at most 3. This improves over the previous (1,4)-approximation in the special case that M is the graphic matroid over the given graph [Fukanaga and Nagamochi, 2009]. Second, we show Demand Matching admits a polynomial-time approximation scheme in graphs that exclude a fixed minor. If all demands are polynomially-bounded integers, this is somewhat easy using dynamic programming in bounded-treewidth graphs. Our main technical contribution is a sparsification lemma that allows us to scale the demands of some items to be used in a more intricate dynamic programming algorithm, followed by some randomized rounding to filter our scaled-demand solution to one whose original demands satisfy all constraints

    Algorithms for Inverse Optimization Problems

    Get PDF
    We study inverse optimization problems, wherein the goal is to map given solutions to an underlying optimization problem to a cost vector for which the given solutions are the (unique) optimal solutions. Inverse optimization problems find diverse applications and have been widely studied. A prominent problem in this field is the inverse shortest path (ISP) problem [D. Burton and Ph.L. Toint, 1992; W. Ben-Ameur and E. Gourdin, 2004; A. Bley, 2007], which finds applications in shortest-path routing protocols used in telecommunications. Here we seek a cost vector that is positive, integral, induces a set of given paths as the unique shortest paths, and has minimum l_infty norm. Despite being extensively studied, very few algorithmic results are known for inverse optimization problems involving integrality constraints on the desired cost vector whose norm has to be minimized. Motivated by ISP, we initiate a systematic study of such integral inverse optimization problems from the perspective of designing polynomial time approximation algorithms. For ISP, our main result is an additive 1-approximation algorithm for multicommodity ISP with node-disjoint commodities, which we show is tight assuming P!=NP. We then consider the integral-cost inverse versions of various other fundamental combinatorial optimization problems, including min-cost flow, max/min-cost bipartite matching, and max/min-cost basis in a matroid, and obtain tight or nearly-tight approximation guarantees for these. Our guarantees for the first two problems are based on results for a broad generalization, namely integral inverse polyhedral optimization, for which we also give approximation guarantees. Our techniques also give similar results for variants, including l_p-norm minimization of the integral cost vector, and distance-minimization from an initial cost vector

    Approximation Algorithms for Clustering and Facility Location Problems

    Get PDF
    Facility location problems arise in a wide range of applications such as plant or warehouse location problems, cache placement problems, and network design problems, and have been widely studied in Computer Science and Operations Research literature. These problems typically involve an underlying set F of facilities that provide service, and an underlying set D of clients that require service, which need to be assigned to facilities in a cost-effective fashion. This abstraction is quite versatile and also captures clustering problems, where one typically seeks to partition a set of data points into k clusters, for some given k, in a suitable way, which themselves find applications in data mining, machine learning, and bioinformatics. Basic variants of facility location problems are now relatively well-u nderstood, but we have much-less understanding of more-sophisticated models that better model the real-world concerns. In this thesis, we focus on three models inspired by some real-world optimization scenarios. In Chapter 2, we consider mobile facility location (MFL) problem, wherein we seek to relocate a given set of facilities to destinations closer to the clients as to minimize the sum of facility-movement and client-assignment costs. This abstracts facility-location settings where one has the flexibility of moving facilities from their current locations to other destinations so as to serve clients more efficiently by reducing their assignment costs. We give the first local-search based approximation algorithm for this problem and achieve the best-known approximation guarantee. Our main result is (3+epsilon)-approximation for this problem for any constant epsilon > 0 using local search which improves the previous best guarantee of 8-approximation algorithm due to [34] based on LP-rounding. Our results extend to the weighted generalization wherein each facility i has a non-negative weight w_i and the movement cost for i is w_i times the distance traveled by i. In Chapter 3, we consider a facility-location problem that we call the minimum-load k-facility location (MLkFL), which abstracts settings where the cost of serving the clients assigned to a facility is incurred by the facility. This problem was studied under the name of min-max star cover in [32,10], who (among other results) gave bicriteria approximation algorithms for MLkFL when F=D. MLkFL is rather poorly understood, and only an O(k)-approximation is currently known for MLkFL, even for line metrics. Our main result is the first polytime approximation scheme (PTAS) for MLkFL on line metrics (note that no non-trivial true approximation of any kind was known for this metric). Complementing this, we prove that MLkFL is strongly NP-hard on line metrics. In Chapter 4, we consider clustering problems with non-uniform lower bounds and outliers, and obtain the first approximation guarantees for these problems. We consider objective functions involving the radii of open facilities, where the radius of a facility i is the maximum distance between i and a client assigned to it. We consider two problems: minimizing the sum of the radii of the open facilities, which yields the lower-bounded min-sum-of-radii with outliers (LBkSRO) problem, and minimizing the maximum radius, which yields the lower-bounded k-supplier with outliers (LBkSupO) problem. We obtain an approximation factor of 12.365 for LBkSRO, which improves to 3.83 for the non-outlier version. These also constitute the first approximation bounds for the min-sum-of-radii objective when we consider lower bounds and outliers separately. We obtain approximation factors of 5 and 3 respectively for LBkSupO and its non-outlier version. These are the first approximation results for k-supplier with non-uniform lower bounds

    Distributed Load Balancing: A New Framework and Improved Guarantees

    Get PDF

    Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning

    Full text link
    The Mixture of Experts (MoE) is a widely known neural architecture where an ensemble of specialized sub-models optimizes overall performance with a constant computational cost. However, conventional MoEs pose challenges at scale due to the need to store all experts in memory. In this paper, we push MoE to the limit. We propose extremely parameter-efficient MoE by uniquely combining MoE architecture with lightweight experts.Our MoE architecture outperforms standard parameter-efficient fine-tuning (PEFT) methods and is on par with full fine-tuning by only updating the lightweight experts -- less than 1% of an 11B parameters model. Furthermore, our method generalizes to unseen tasks as it does not depend on any prior task knowledge. Our research underscores the versatility of the mixture of experts architecture, showcasing its ability to deliver robust performance even when subjected to rigorous parameter constraints. Our code used in all the experiments is publicly available here: https://github.com/for-ai/parameter-efficient-moe

    Approximation Algorithms for Minimum-Load k-Facility Location

    Get PDF
    We consider a facility-location problem that abstracts settings where the cost of serving the clients assigned to a facility is incurred by the facility. Formally, we consider the minimum-load k-facility location (MLkFL) problem, which is defined as follows. We have a set F of facilities, a set C of clients, and an integer k > 0. Assigning client j to a facility f incurs a connection cost d(f, j). The goal is to open a set F\u27 of k facilities, and assign each client j to a facility f(j) in F\u27 so as to minimize maximum, over all facilities in F\u27, of the sum of distances of clients j assigned to F\u27 to F\u27. We call this sum the load of facility f. This problem was studied under the name of min-max star cover in [6, 2], who (among other results) gave bicriteria approximation algorithms for MLkFL for when F = C. MLkFL is rather poorly understood, and only an O(k)-approximation is currently known for MLkFL, even for line metrics. Our main result is the first polynomial time approximation scheme (PTAS) for MLkFL on line metrics (note that no non-trivial true approximation of any kind was known for this metric). Complementing this, we prove that MLkFL is strongly NP-hard on line metrics. We also devise a quasi-PTAS for MLkFL on tree metrics. MLkFL turns out to be surprisingly challenging even on line metrics, and resilient to attack by the variety of techniques that have been successfully applied to facility-location problems. For instance, we show that: (a) even a configuration-style LP-relaxation has a bad integrality gap; and (b) a multi-swap k-median style local-search heuristic has a bad locality gap. Thus, we need to devise various novel techniques to attack MLkFL. Our PTAS for line metrics consists of two main ingredients. First, we prove that there always exists a near-optimal solution possessing some nice structural properties. A novel aspect of this proof is that we first move to a mixed-integer LP (MILP) encoding the problem, and argue that a MILP-solution minimizing a certain potential function possesses the desired structure, and then use a rounding algorithm for the generalized-assignment problem to "transfer" this structure to the rounded integer solution. Complementing this, we show that these structural properties enable one to find such a structured solution via dynamic programming

    Comparison of the extracellular full-length and truncated recombinant protein A production in Escherichia coli BL21 (DE3)

    Get PDF
         Protein A is a commercially important protein in biotechnological and medicinal applications. The great value of this protein and its applications in genetic and protein engineering and microbial researches as well as the growing use in biochemical industries, biotechnology, medicine and pharmacology, highlight the importance of the present study. In this survey the encoding genes of full-length and truncated forms of protein A were expressed in E. coli under an optimized expression condition. Optimization of the culture conditions resulted in an increase in expression and secretion of both forms of the protein, the pattern of expression and secretion levels for two forms was completely different. A minimum of 10-fold higher expression was observed for the truncated protein in comparison to that of the full-length recombinant form. Hydropathy plot of both forms of proteins showed that the missing domains in the truncated form contain groups of amino acids with high hydrophobicity score. Deletion of the terminal region could led to a higher expression level of the recombinant protein in E. coli. The function of these two proteins was studied using ELISA, which showed a higher activity for the truncated form for binding to IgG, compared to the full-length protein.
    corecore